knitr::opts_chunk$set(echo = TRUE)
Obtain the working directory.
getwd()
Read in the SPRUCE.csv data.
spruce.df <- read.csv("SPRUCE.csv") head(spruce.df)
Plot and interpret the spruce data.
with(spruce.df, { plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2, main="Spruce Breast Height Diameter vs Height") } )
The data appears to be mostly linear, although it is not perfect.
library(s20x) trendscatter(Height~BHDiameter, f=0.5, data=spruce.df) trendscatter(Height~BHDiameter, f=0.6, data=spruce.df) trendscatter(Height~BHDiameter, f=0.7, data=spruce.df) spruce.lm=with(spruce.df, lm(Height~BHDiameter)) with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2, main="Spruce Breast Height Diameter vs Height")}) abline(spruce.lm)
The line is not an accurate representation of the data. The trendscatter curve seems more accurate, although more information for lower values would make it easier to tell.
Determine sums of squares for the spruce data.
layout(matrix(1:4,nr=2,nc=2,byrow=TRUE)) with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)}) abline(spruce.lm) with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)}) with(spruce.df, {segments(BHDiameter,Height,BHDiameter,fitted(spruce.lm),col="Blue")}) abline(spruce.lm) with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)}) with(spruce.df, {segments(BHDiameter,mean(Height),BHDiameter,fitted(spruce.lm),col="Red")}) abline(spruce.lm) abline(h=mean(spruce.df$Height)) with(spruce.df, {plot(Height~BHDiameter,bg="Blue",pch=21,ylim=c(0,1.1*max(Height)),xlim=c(0,1.1*max(BHDiameter)), cex=1.2)}) with(spruce.df, {segments(BHDiameter,Height,BHDiameter,mean(Height),col="Green")}) abline(h=mean(spruce.df$Height)) RSS=with(spruce.df, sum((Height-fitted(spruce.lm))^2)) RSS MSS=with(spruce.df, sum((mean(Height)-fitted(spruce.lm))^2)) MSS TSS=with(spruce.df, sum((Height-mean(Height))^2)) TSS MSS/TSS
MSS/TSS represents the total proportion of the variance that the model accounts for.
TSS
MSS+RSS
TSS is equal to MSS+RSS.
Describe the regression line.
summary(spruce.lm)
The slope of the line is 0.48147.
The y-intercept is 9.14684.
The line's equation is Height = 0.48147*BHDiameter + 9.14684.
predict(spruce.lm, data.frame(BHDiameter=c(15,18,20)))
Create a detailed plot of the data.
library(ggplot2) g=ggplot(spruce.df, aes(x=BHDiameter,y=Height,colour=BHDiameter)) g=g+geom_point() + geom_line()+ geom_smooth(method="lm") g+ggtitle("Height vs BHDiameter")
Create a shiny interactive document with a plot of the spruce data.
{ width=70% }
{ width=70% }
{ width=70% }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.